Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42458: Modify variance plane when injecting sources and optionally vary injected light #29

Merged
merged 5 commits into from
Mar 5, 2024

Conversation

jjbuchanan
Copy link
Contributor

No description provided.

@timj
Copy link
Member

timj commented Jan 11, 2024

@jjbuchanan can you check that the line endings on your files haven't been modified? The diff is showing entire changes of file content even in the headers which haven't changed.

@jjbuchanan
Copy link
Contributor Author

@timj I've fixed the line endings, and the diffs look better now.

@jjbuchanan
Copy link
Contributor Author

jjbuchanan commented Jan 13, 2024

Fixed the whitespace and import order issues that were producing flake8 and isort errors.

Copy link
Contributor

@jmeyers314 jmeyers314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, James. I'd still like to run this through a few pipelines to make sure everything makes sense there, but thought I'd give you some initial feedback today.

In addition to the line-by-line stuff, I think you can probably add some unit tests. Things like comparing add_noise=True with add_noise=False with otherwise identical inputs.

@@ -72,6 +112,13 @@ class CoaddInjectTask(BaseInjectTask):
_DefaultName = "coaddInjectTask"
ConfigClass = CoaddInjectConfig

def run(self, injection_catalogs, input_exposure, psf, photo_calib, wcs):
self.log.info("Fitting flux vs. variance in each pixel.")
self.config.variance_scale = self.get_variance_scale(input_exposure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should pass calculated values through the config. The config gets written to the butler once per collection, so having values that differ (from exposure to exposure) within a collection will get lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I can instead pass variance_scale as an argument to BaseInjectTask.run(), on the same footing as things like photo_calib.

@@ -82,3 +129,124 @@ def runQuantum(self, butler_quantum_context, input_refs, output_refs):
input_keys = ["injection_catalogs", "input_exposure", "sky_map", "psf", "photo_calib", "wcs"]
outputs = self.run(**{key: value for (key, value) in inputs.items() if key in input_keys})
butler_quantum_context.put(outputs, output_refs)

def get_variance_scale(self, exposure):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a docstring here explaining the basic algorithmic idea. Why there are two rounds of RANSAC, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with one. The algorithm has multiple parts that need explaining so it's a little verbose, but hopefully okay.

@@ -82,3 +129,124 @@ def runQuantum(self, butler_quantum_context, input_refs, output_refs):
input_keys = ["injection_catalogs", "input_exposure", "sky_map", "psf", "photo_calib", "wcs"]
outputs = self.run(**{key: value for (key, value) in inputs.items() if key in input_keys})
butler_quantum_context.put(outputs, output_refs)

def get_variance_scale(self, exposure):
x = exposure.image.array.flatten()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these variables are large enough in scope to get spelled out. Maybe flux and var or variance, but not x and y.

Also, I think .flatten will always produce a copy, where .ravel will only copy when necessary. So as long as these are read-only (I think that's right?) .ravel might be a tad faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the variable names.

Interesting point about flatten vs. ravel. I'll go with ravel, while noting that I'm also changing the way that nan pixels are handled.

bad_pixels = ~np.isfinite(x) | ~np.isfinite(y)
# Replace bad pixel values with the image median
if np.sum(bad_pixels) > 0:
median_image_value = np.median(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs to be np.nanmedian?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of replacing nan pixels with the median value, I think it may be best to just ignore them, by assigning flux = flux[good_pixels].

kmeans.labels_ == np.argmax(label_counts)
]
if len(stable_fit_seeds == 0):
# Throw a warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to find an appropriate warning to raise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented, and another one at the very end if the final slope ends up being negative.


# If the smooth image can be drawn successfully,
# we can do everything else.
if draw_succeeded:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the try/except either succeed, continue (on GalSimFFTSizeError), or halt (on any other error)?
I think this is always true, so unneeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

if add_noise:
# For generating noise,
# variance must be meaningful.
if np.sum(variance_template.array < 0) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing np.any is a hair faster. It has the possibility to short-circuit for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -104,6 +108,13 @@ class VisitInjectTask(BaseInjectTask):
_DefaultName = "visitInjectTask"
ConfigClass = VisitInjectConfig

def run(self, injection_catalogs, input_exposure, psf, photo_calib, wcs):
self.log.info("Fitting flux vs. variance in each pixel.")
self.config.variance_scale = self.get_variance_scale(input_exposure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as earlier, I don't think we can pass calculated values through the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as noted above.

@@ -128,3 +139,35 @@ def runQuantum(self, butler_quantum_context, input_refs, output_refs):
input_keys = ["injection_catalogs", "input_exposure", "sky_map", "psf", "photo_calib", "wcs"]
outputs = self.run(**{key: value for (key, value) in inputs.items() if key in input_keys})
butler_quantum_context.put(outputs, output_refs)

def get_variance_scale(self, exposure):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

y[bad_pixels] = median_variance_value

# Only fit pixels with at least this much inst flux
brightness_cutoff = 500
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest that this threshold should be stored and thus accessible in the config for the Task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do.

ransac = RANSACRegressor(
loss="squared_error",
residual_threshold=self.config.threshold_scale_1 * linear_mse,
max_trials=1000,
Copy link

@wmwv wmwv Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this hard-coded value for max_trials be accessible at the function keyword level, or even the Task config level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to put it at the function keyword level at least, and I'm fine with putting it in the Task config if people are happy with the proliferation of config options. Ideally this value should be high enough so that it's never the thing that causes a pathology in practice, and it seems like scikit-learn's default value is a little low, which is why I specify a different value here.

# K-means cluster the first round of fits,
# to find the most stable results.
kmeans = KMeans(
n_clusters=self.config.n_clusters_1, random_state=self.config.kmeans_seed_1, n_init=10
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should n_init be accessible at the Task config level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. I can put it there.

@jjbuchanan jjbuchanan changed the title Modify variance plane when injecting sources and optionally vary injected light DM-42458: Modify variance plane when injecting sources and optionally vary injected light Feb 14, 2024
@jmeyers314 jmeyers314 merged commit f4d6588 into lsst:tickets/DM-42458 Mar 5, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants